1 Introduction
This HTML notebook will replicate the STATA do file named vnm_mics14_dp2019 for distribution in R. The goal is to create an R script that does what the said do file does.
I will follow the same section numbering as in the do file for ease of comparision.
Goes without saying, all mistakes in this document are mine.
2 Libraries
Before we go into the excercise, following are the packages from which we will use various functions.
## here() starts at D:/R projects/OPHI/Vietnam Translation/Vietnam-MICS-MPI-to-R
## -- Attaching packages --------
## v ggplot2 3.3.0 v purrr 0.3.4
## v tibble 3.0.1 v dplyr 0.8.5
## v tidyr 1.0.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts -----------------
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'janitor' was built under R version 3.6.3
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
## Warning: package 'gt' was built under R version 3.6.3
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
3 Building a Well-Being Index from Vietnam MICS 2013-14
The memory and environment clearing commands and the commands for setting working folders and paths are not needed if one is working in Project mode through RStudio and uses the {here} (read as package here). This is a library for managing paths and directories.
4 Vietnam MICS 2014
4.1 Step 1: Data Preparation
__ Selecting main variables from CH, WM, HH & MN recode & merging with HL recode __
It should be noted that anthropometric data was not collected for children under 5 as part of the Viet Nam MICS 2014 dataset. Previously, nutrition data was collected as part of Viet Nam MICS 2011. However, the data was not collected in this round due to time and resource constraints as well as the availability of national nutrition survey data (p.61)
Above comments are copied from the STATA do file
4.1.1 Step 1.1: CH - Children’s Recode (Under 5)
According to the STATA do file there is no data for this section.
4.1.2 Step 1.2: BH - Birth Recode (All females 15-49 years who ever gave birth)
The purpose of step 1.2 is to identify children of any age who died in the last 5 years prior to the survey date. As seen in the STATA file
Loading the data from bh.sav.
bh_dat <- read_sav(file = here("Viet Nam_MICS5_Datasets",
"Viet Nam MICS 2013-14 SPSS Datasets",
"bh.sav"))
bh_dat <- clean_names(bh_dat)The above code chunk loads the bh.sav and names it bh_dat(a data object). The clean names function gets all the variable names in lower snake case. IN case if anyone is wondering what are all the possible cases, please refere to the wonderful art by Allison Horst shown below.
Various Cases (Let me know your favourite)
Now let us take a glimplse at the data and names of the variables.
## [1] "hh1" "hh2" "ln" "bhln" "bh2" "bh3"
## [7] "bh4m" "bh4y" "bh5" "bh6" "bh7" "bh8"
## [13] "bh9u" "bh9n" "bh10" "bh4c" "bh4f" "bh9c"
## [19] "bh9f" "hh6" "hh7" "wdoi" "wdob" "ethnicity"
## [25] "welevel" "brthord" "magebrt" "birthint" "wmweight" "wscore"
## [31] "windex5" "wscoreu" "windex5u" "wscorer" "windex5r" "windex2"
| hh1 | hh2 | ln | bhln | bh2 | bh3 | bh4m | bh4y | bh5 | bh6 | bh7 | bh8 | bh9u | bh9n | bh10 | bh4c | bh4f | bh9c | bh9f | hh6 | hh7 | wdoi | wdob | ethnicity | welevel | brthord | magebrt | birthint | wmweight | wscore | windex5 | wscoreu | windex5u | wscorer | windex5r | windex2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 2 | 1 | 1 | 1 | 4 | 2011 | 1 | 2 | 1 | 4 | NA | NA | NA | 1336 | 1 | NA | NA | 1 | 1 | 1368 | 1044 | 1 | 4 | 1 | 2 | 1 | 2.167035 | 1.640778 | 5 | 1.337577 | 5 | NA | NA | 2 |
| 1 | 2 | 2 | 2 | 1 | 1 | 5 | 2013 | 1 | 0 | 1 | 5 | NA | NA | 2 | 1361 | 1 | NA | NA | 1 | 1 | 1368 | 1044 | 1 | 4 | 2 | 2 | 2 | 2.167035 | 1.640778 | 5 | 1.337577 | 5 | NA | NA | 2 |
| 1 | 3 | 2 | 1 | 1 | 2 | 3 | 2003 | 1 | 10 | 1 | 4 | NA | NA | NA | 1239 | 1 | NA | NA | 1 | 1 | 1368 | 939 | 1 | 3 | 1 | 2 | 1 | 2.167035 | 1.427797 | 5 | 1.071892 | 5 | NA | NA | 2 |
| 1 | 3 | 2 | 2 | 1 | 1 | 6 | 2007 | 1 | 6 | 1 | 5 | NA | NA | 2 | 1290 | 1 | NA | NA | 1 | 1 | 1368 | 939 | 1 | 3 | 2 | 2 | 4 | 2.167035 | 1.427797 | 5 | 1.071892 | 5 | NA | NA | 2 |
| 1 | 4 | 2 | 1 | 1 | 2 | 11 | 2007 | 1 | 6 | 1 | 4 | NA | NA | NA | 1295 | 1 | NA | NA | 1 | 1 | 1368 | 1008 | 1 | 4 | 1 | 2 | 1 | 2.167035 | 1.613568 | 5 | 1.303632 | 5 | NA | NA | 2 |
| 1 | 6 | 2 | 1 | 1 | 1 | 3 | 1995 | 1 | 18 | 1 | 3 | NA | NA | NA | 1143 | 1 | NA | NA | 1 | 1 | 1368 | 788 | 1 | 4 | 1 | 2 | 1 | 2.167035 | 1.624926 | 5 | 1.317802 | 5 | NA | NA | 2 |
Now let us create variables, only those that are non grouped, i.e. those generated without using bysort. In R one can create grouped varibales in a different manner. Will show that in the code chunk following this one.
Before moving forward, please note that, a large part of coding involved in creating this index is a data wrangling process. The tidyverse way of carrying out data wrangling extensively employs {dplyr} (read as ‘package dplyr’) and its mutate function. Here is a visual cue (created by Allison Horst) to what the mutate function does.
Wrangling and adding columns
bh_dat <- bh_dat %>%
mutate(
ind_id = structure(((hh1*100000)+(hh2*100)+ln),
label = "Individual ID"),
date_death = structure(bh4c + bh9c,
label = "Date of Death CMC"),
mdead_survey = structure(ifelse(((bh9c == 0 |is.na(bh9c)) & bh5 == 1),
NA,
wdoi - date_death),
label = "Months Dead from Survey"),
ydead_survey = structure(mdead_survey/12,
label = "Years dead from Survey"),
age_death = structure(ifelse(bh5 == 2, bh9c, NA),
label = "Age at death in Months"),
child_died = structure(forcats::as_factor(
ifelse(is.na(bh5),NA,
ifelse(bh5==1,"child is alive","child is dead"))))
) %>%
group_by(ind_id) %>%
mutate(
tot_child_died = sum(child_died == "child is dead", na.rm = T)
) %>%
ungroup() %>%
mutate(
child18_died = forcats::as_factor(
ifelse((!is.na(age_death) & age_death > 216),
ifelse(is.na(bh5),NA,
ifelse(bh5==1,
"child is alive",
"child is dead")),
"child is alive"))
) %>%
group_by(ind_id) %>%
mutate(
tot_child18_died_5y = sum(
child18_died[ydead_survey <= 5 & !is.na(ydead_survey)] == "child is dead", na.rm = T)
) %>%
ungroup() %>%
mutate(
tot_child18_died_5y = ifelse(
(is.na(tot_child18_died_5y) & tot_child_died>=0 & !is.na(tot_child_died)),0,tot_child18_died_5y),
tot_child18_died_5y = ifelse(
child18_died == "child is dead" & is.na(ydead_survey),
NA, tot_child18_died_5y)
) %>%
group_by(ind_id) %>%
mutate(
childu18_dies_per_wom_5y = structure(
max(tot_child18_died_5y,na.rm = T),
label = "Total child under 18 death for each women in the last 5 years (birth recode)")
) %>%
ungroup()Check if all ind ids are uniuqe
## [1] FALSE
There are duplicate ind_ids. Let us remove duplicates.
Check again if all ind ids are uniuqe
## [1] TRUE
Creating Tables for new variables check.
| age_death | n | percent | valid_percent |
|---|---|---|---|
| 0 | 111 | 0.0156316012 | 0.493333333 |
| 1 | 12 | 0.0016899028 | 0.053333333 |
| 2 | 6 | 0.0008449514 | 0.026666667 |
| 3 | 5 | 0.0007041262 | 0.022222222 |
| 4 | 1 | 0.0001408252 | 0.004444444 |
| 5 | 3 | 0.0004224757 | 0.013333333 |
| 6 | 7 | 0.0009857767 | 0.031111111 |
| 7 | 2 | 0.0002816505 | 0.008888889 |
| 8 | 4 | 0.0005633009 | 0.017777778 |
| 9 | 2 | 0.0002816505 | 0.008888889 |
| 11 | 2 | 0.0002816505 | 0.008888889 |
| 12 | 8 | 0.0011266019 | 0.035555556 |
| 14 | 3 | 0.0004224757 | 0.013333333 |
| 18 | 1 | 0.0001408252 | 0.004444444 |
| 23 | 2 | 0.0002816505 | 0.008888889 |
| 24 | 11 | 0.0015490776 | 0.048888889 |
| 36 | 11 | 0.0015490776 | 0.048888889 |
| 48 | 5 | 0.0007041262 | 0.022222222 |
| 72 | 5 | 0.0007041262 | 0.022222222 |
| 84 | 2 | 0.0002816505 | 0.008888889 |
| 96 | 1 | 0.0001408252 | 0.004444444 |
| 108 | 4 | 0.0005633009 | 0.017777778 |
| 120 | 1 | 0.0001408252 | 0.004444444 |
| 144 | 1 | 0.0001408252 | 0.004444444 |
| 168 | 1 | 0.0001408252 | 0.004444444 |
| 180 | 1 | 0.0001408252 | 0.004444444 |
| 192 | 2 | 0.0002816505 | 0.008888889 |
| 204 | 1 | 0.0001408252 | 0.004444444 |
| 216 | 2 | 0.0002816505 | 0.008888889 |
| 228 | 3 | 0.0004224757 | 0.013333333 |
| 240 | 2 | 0.0002816505 | 0.008888889 |
| 264 | 1 | 0.0001408252 | 0.004444444 |
| 276 | 1 | 0.0001408252 | 0.004444444 |
| 312 | 1 | 0.0001408252 | 0.004444444 |
| NA | 6876 | 0.9683143219 | NA |
| bh5 | child is alive | child is dead |
|---|---|---|
| 1 | 6876 | 0 |
| 2 | 0 | 225 |
| child18_died | n | percent |
|---|---|---|
| child is alive | 7093 | 0.998873398 |
| child is dead | 8 | 0.001126602 |
## tot_child_died 0 1
## 0 6647 0
## 1 382 7
## 2 55 0
## 3 6 2
## 4 2 0
Writing new files.
4.1.3 Step 1.3: WM - Women’s Record (All eligible females 15-49 in the household)
Loading data and setting case for variable (column) names.
wm_dat <- read_sav(file = here("Viet Nam_MICS5_Datasets",
"Viet Nam MICS 2013-14 SPSS Datasets",
"wm.sav"))
wm_dat <- clean_names(wm_dat)Mutating the data for creating new variables. This is what one would refere to generating new varibles in STATA, I think.
Note, for the varible marital or marital_wom, I miss to see whyn the STATA code defines it as marital to begin with and renames it to marital_wom. I will define this varibale as marital_wom from the begining itself. Kindly bring to notice if this is incorrect.
wm_dat <- wm_dat %>%
mutate(
ind_id = structure(((hh1*100000)+(hh2*100)+ln),
label = "Individual ID"),
women_wm = 1,
marital_wom = structure(
forcats::as_factor(
case_when(
mstatus == 3 & is.na(ma6) ~ "Never Married",
mstatus == 1 & is.na(ma6) ~ "Currently Married",
mstatus == 2 & ma6 == 1 ~ "Widowed",
mstatus == 2 & ma6 == 2 ~ "Divorced",
mstatus == 2 & ma6 == 3 ~ "Seperated/Not living together",
)
), label = "Marital Status of Household Member"
)
) While creating new variables the STATA code has various tables. I will generate those here.
## [1] TRUE
This means all Individual Ids are unique, i.e., every row represents a unique individual.
| cm1 | 1 | 2 | NA_ |
|---|---|---|---|
| 1 | 454 | 6647 | 0 |
| 2 | 0 | 2726 | 0 |
| NA | 0 | 0 | 363 |
## # A tibble: 10 x 2
## mstatus ma6
## <dbl+lbl> <dbl+lbl>
## 1 1 [Currently married/in union] NA
## 2 1 [Currently married/in union] NA
## 3 1 [Currently married/in union] NA
## 4 3 [Never married/in union] NA
## 5 3 [Never married/in union] NA
## 6 3 [Never married/in union] NA
## 7 3 [Never married/in union] NA
## 8 1 [Currently married/in union] NA
## 9 1 [Currently married/in union] NA
## 10 3 [Never married/in union] NA
| mstatus | 1 | 2 | 3 | NA_ |
|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 6972 |
| 2 | 207 | 193 | 107 | 0 |
| 3 | 0 | 0 | 0 | 2348 |
| NA | 0 | 0 | 0 | 363 |
| marital_wom | n | percent | valid_percent |
|---|---|---|---|
| Currently Married | 6972 | 0.68420020 | 0.70947390 |
| Never Married | 2348 | 0.23042198 | 0.23893355 |
| Widowed | 207 | 0.02031403 | 0.02106441 |
| Divorced | 193 | 0.01894014 | 0.01963977 |
| Seperated/Not living together | 107 | 0.01050049 | 0.01088837 |
| NA | 363 | 0.03562316 | NA |
| ma6 | Currently Married | Never Married | Widowed | Divorced | Seperated/Not living together | NA_ |
|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 207 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 193 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 | 107 | 0 |
| NA | 6972 | 2348 | 0 | 0 | 0 | 363 |
| mstatus | Currently Married | Never Married | Widowed | Divorced | Seperated/Not living together | NA_ |
|---|---|---|---|---|---|---|
| 1 | 6972 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 207 | 193 | 107 | 0 |
| 3 | 0 | 2348 | 0 | 0 | 0 | 0 |
| NA | 0 | 0 | 0 | 0 | 0 | 363 |
Keeping relevant columns.
Writing the tables in csv.
4.1.4 Step 1.4: MR - MEn’s Recode (All eligible men in the household)
As mentioned in the STATA code, Note: There is no male recode file for Viet Nam MICS 2014. Hence the commands under this section have been remove
4.1.5 Step 1.5 HH - Household Record (All Households intereviewed)
Load data.
hh_dat <- read_sav(file = here("Viet Nam_MICS5_Datasets",
"Viet Nam MICS 2013-14 SPSS Datasets",
"hh.sav"))
hh_dat <- clean_names(hh_dat)Creating new variables.
write data in csv.
4.1.6 step 1.6: HL - Household Member
Load data.
hl_dat <- read_sav(file = here("Viet Nam_MICS5_Datasets",
"Viet Nam MICS 2013-14 SPSS Datasets",
"hl.sav"))
hl_dat <- clean_names(hl_dat)Creating new variables.
4.1.7 Step 1.7: Data Merging
Unlike the STATA code the merging here will look a little different. It will be in a single flow. However, checks will be shown prior to mrunnig the merge command.
hl_dat %>%
left_join(bh_dat, by = c("ind_id" = "ind_id")) %>%
left_join(wm_dat, by = c("ind_id" = "ind_id")) %>%
tabyl(hl7, show_na = T)## hl7 n percent
## 0 29190 0.7412392077
## 1 1065 0.0270441849
## 2 4650 0.1180802438
## 3 2037 0.0517267649
## 4 1479 0.0375571356
## 5 541 0.0137379380
## 6 228 0.0057897410
## 7 110 0.0027932961
## 8 38 0.0009649568
## 9 27 0.0006856272
## 10 9 0.0002285424
## 11 3 0.0000761808
## 13 2 0.0000507872
## 15 1 0.0000253936
hl_dat %>%
left_join(bh_dat, by = c("ind_id" = "ind_id")) %>%
left_join(wm_dat, by = c("ind_id" = "ind_id")) %>%
mutate(temp = hl7 > 0) %>%
tabyl(women_wm, temp, show_na = T)## women_wm FALSE TRUE
## 1 0 10190
## NA 29190 0
hl_dat %>%
left_join(bh_dat, by = c("ind_id" = "ind_id")) %>%
left_join(wm_dat, by = c("ind_id" = "ind_id")) %>%
mutate(temp = hl7 > 0) %>%
filter((temp == 1 & is.na(women_wm))) %>%
tabyl(wm7, show_na = T)## [1] wm7 n percent
## <0 rows> (or 0-length row.names)
hl_dat %>%
left_join(bh_dat, by = c("ind_id" = "ind_id")) %>%
left_join(wm_dat, by = c("ind_id" = "ind_id")) %>%
left_join(hh_dat, by = c("hh_id" = "hh_id")) %>%
filter(hh9 ==1) %>% # keeping only households that completed interview
mutate(
marital_men = structure(NA, label = "Marital Status of household member")
) -> merged_datWriting this object for safe keeping.
4.1.8 Step 1.8: Control Variables
Comments from the STATA code:
Households are identified as having ‘no eligible’ members if there are no applicable population, that is, children 0-5 years, adult women 15-49 years or adult men. These households will not have information on relevant indicators of health. As such, these households are considered as non-deprived in those relevant indicators.
As one must have noticed from previous sections, I prefer not to gererate variables that are grouped in the same data frame. However, in the following sections I will follow as the code in STATA do file. This is because the code seems to mutate the same data frame starting from section Step 1.6.
Having said this, I should explain why I am averse to this practice. Consider the following example.
tibble::tibble(
a = c(1,2,3,4,5,6),
b= rep(letters[1:3],2)
) -> data_example ## data for illustration
data_example## # A tibble: 6 x 2
## a b
## <dbl> <chr>
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 a
## 5 5 b
## 6 6 c
Say, we want sum of column a for each type of observation in column b. We shall name this new variable c.
## # A tibble: 6 x 3
## # Groups: b [3]
## a b c
## <dbl> <chr> <dbl>
## 1 1 a 5
## 2 2 b 7
## 3 3 c 9
## 4 4 a 5
## 5 5 b 7
## 6 6 c 9
Above is the output we would generate if we wanted a sum of ‘a’ grouped by ‘b’. This might end up being misleading to someone who has neve seen the code but want to use the data. Some might do a simple average of cloumn c or try other summary statistics over this new column.
Moving on.
4.1.8.1 No Eligible Women 15-49 years
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women")
) %>%
tabyl(no_fem_eligible, show_na = T) %>%
gt()| no_fem_eligible | n | percent |
|---|---|---|
| FALSE | 33847 | 0.8594972 |
| TRUE | 5533 | 0.1405028 |
4.1.8.2 No Men Eligible
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man")
) %>%
tabyl(no_male_eligible, show_na = T) %>%
gt()| no_male_eligible | n | percent | valid_percent |
|---|---|---|---|
| NA | 39380 | 1 | NA |
4.1.8.3 NO Eligible Children 0-5 years
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible")
) %>%
tabyl(no_child_eligible) %>%
gt()| no_child_eligible | n | percent | valid_percent |
|---|---|---|---|
| NA | 39380 | 1 | NA |
4.1.8.4 No Eligible Women and Men
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible"),
no_adults_eligible = structure(NA,
label = "Household has no eligible women and men")
) %>%
tabyl(no_adults_eligible, show_na = T) %>%
gt()| no_adults_eligible | n | percent | valid_percent |
|---|---|---|---|
| NA | 39380 | 1 | NA |
4.1.8.5 NO Eligible Children and Women
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible"),
no_adults_eligible = structure(NA,
label = "Household has no eligible women and men"),
no_child_fem_eligible = structure(NA,
label = "Household has no children or women eligible")
) %>%
tabyl(no_child_fem_eligible, show_na = T) %>%
gt()| no_child_fem_eligible | n | percent | valid_percent |
|---|---|---|---|
| NA | 39380 | 1 | NA |
4.1.8.6 No Eligible Women, Men or Children
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible"),
no_adults_eligible = structure(NA,
label = "Household has no eligible women and men"),
no_child_fem_eligible = structure(NA,
label = "Household has no children or women eligible"),
no_eligibles = structure(NA,
label = "Household has no eligible women, men or children")
) %>%
tabyl(no_eligibles, show_na = T) %>%
gt()| no_eligibles | n | percent | valid_percent |
|---|---|---|---|
| NA | 39380 | 1 | NA |
4.1.8.7 No Eligible Subsample
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible"),
no_adults_eligible = structure(NA,
label = "Household has no eligible women and men"),
no_child_fem_eligible = structure(NA,
label = "Household has no children or women eligible"),
no_eligibles = structure(NA,
label = "Household has no eligible women, men or children"),
no_hem_eligible = structure(NA, label = "Household has no eligible individuals for hemoglobin measurement")
) %>%
select(-c("fem_eligible", "hh_n_fem_eligible"))## # A tibble: 39,380 x 259
## hh1.x hh2.x hl1 hl3 hl4 hl5m hl5y hl6 hl7 hl7b
## <dbl> <dbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl> <dbl> <dbl+l> <dbl+l>
## 1 1 1 1 1 [Hea~ 1 [Mal~ 5 1964 49 0 [Not~ 0 [Not~
## 2 1 1 2 2 [Wif~ 2 [Fem~ 6 1963 50 0 [Not~ 0 [Not~
## 3 1 1 3 3 [Son~ 1 [Mal~ 10 1998 15 0 [Not~ 0 [Not~
## 4 1 2 1 1 [Hea~ 2 [Fem~ 9 1957 56 0 [Not~ 0 [Not~
## 5 1 2 2 3 [Son~ 2 [Fem~ 12 1986 27 2 0 [Not~
## 6 1 2 3 4 [Son~ 1 [Mal~ 11 1983 30 0 [Not~ 0 [Not~
## 7 1 2 4 5 [Gra~ 1 [Mal~ 4 2011 2 0 [Not~ 4
## 8 1 2 5 5 [Gra~ 1 [Mal~ 5 2013 0 0 [Not~ 5
## 9 1 3 1 1 [Hea~ 2 [Fem~ 98 [DK] 1943 70 0 [Not~ 0 [Not~
## 10 1 3 2 3 [Son~ 2 [Fem~ 3 1978 35 2 0 [Not~
## # ... with 39,370 more rows, and 249 more variables: hl11 <dbl+lbl>,
## # hl12 <dbl+lbl>, hl12a <dbl+lbl>, hl13 <dbl+lbl>, hl14 <dbl+lbl>,
## # hl14a <dbl+lbl>, hl15 <dbl>, ed1 <dbl>, ed3 <dbl+lbl>, ed4a <dbl+lbl>,
## # ed4b <dbl+lbl>, ed5 <dbl+lbl>, ed6a <dbl+lbl>, ed6b <dbl+lbl>,
## # ed7 <dbl+lbl>, ed8a <dbl+lbl>, ed8b <dbl+lbl>, mp3 <dbl>, mp5 <dbl+lbl>,
## # mp6 <dbl+lbl>, mp7 <dbl+lbl>, hh5d.x <dbl>, hh5m.x <dbl>, hh5y.x <dbl>,
## # hh6.x <dbl+lbl>, hh7.x <dbl+lbl>, mline <dbl+lbl>, fline <dbl+lbl>,
## # ethnicity.x <dbl+lbl>, helevel.x <dbl+lbl>, melevel <dbl+lbl>,
## # felevel <dbl+lbl>, schage <dbl+lbl>, hhweight.x <dbl>, wscore.x <dbl>,
## # windex5.x <dbl+lbl>, wscoreu.x <dbl>, windex5u.x <dbl+lbl>,
## # wscorer.x <dbl>, windex5r.x <dbl+lbl>, hh_id <dbl>, ind_id <dbl>,
## # hh1.y <dbl>, hh2.y <dbl>, ln <dbl>, bhln <dbl>, bh2 <dbl+lbl>,
## # bh3 <dbl+lbl>, bh4m <dbl+lbl>, bh4y <dbl+lbl>, bh5 <dbl+lbl>,
## # bh6 <dbl+lbl>, bh7 <dbl+lbl>, bh8 <dbl+lbl>, bh9u <dbl+lbl>,
## # bh9n <dbl+lbl>, bh10 <dbl+lbl>, bh4c <dbl>, bh4f <dbl+lbl>, bh9c <dbl>,
## # bh9f <dbl+lbl>, hh6.y <dbl+lbl>, hh7.y <dbl+lbl>, wdoi <dbl>, wdob <dbl>,
## # ethnicity.y <dbl+lbl>, welevel <dbl+lbl>, brthord <dbl+lbl>,
## # magebrt <dbl+lbl>, birthint <dbl+lbl>, wmweight <dbl>, wscore.y <dbl>,
## # windex5.y <dbl+lbl>, wscoreu.y <dbl>, windex5u.y <dbl+lbl>,
## # wscorer.y <dbl>, windex5r.y <dbl+lbl>, windex2 <dbl>, date_death <dbl>,
## # mdead_survey <dbl>, ydead_survey <dbl>, age_death <dbl>, child_died <fct>,
## # tot_child_died <int>, child18_died <fct>, tot_child18_died_5y <int>,
## # childu18_dies_per_wom_5y <int>, wm7 <dbl+lbl>, cm1 <dbl+lbl>,
## # cm8 <dbl+lbl>, cm9a <dbl>, cm9b <dbl>, women_wm <dbl>, marital_wom <fct>,
## # hh1 <dbl>, hh2 <dbl>, hh3 <dbl>, hh4 <dbl>, hh5d.y <dbl>, hh5m.y <dbl>, ...
4.1.9 Step 1.9: Renaming Demographic Variables
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible"),
no_adults_eligible = structure(NA,
label = "Household has no eligible women and men"),
no_child_fem_eligible = structure(NA,
label = "Household has no children or women eligible"),
no_eligibles = structure(NA,
label = "Household has no eligible women, men or children"),
no_hem_eligible = structure(NA, label = "Household has no eligible individuals for hemoglobin measurement")
) %>%
select(-c("fem_eligible", "hh_n_fem_eligible")) %>%
mutate(
weight = structure(hhweight.x, label = "Sample Weight"), # Sample Weight
area = structure(ifelse(hh6.x == 2,0,hh6.x),
label = "Area: urban-rural",
labels = c("rural","urban")), #Area: urban or rural||factors in
#R start with 1, to maintain #consistency with stata code #creating these integers as #numerics with levels
sex = structure(hl4,label = "Sex of Household member"), # Sex oh HH member
age = structure(ifelse(hl6>=98,NA,hl6),
label = "Age of household member"), #Age of HH member
agec7 =
structure(
factor(
case_when(
age >= 0 & age <= 4 ~ "0-4",
age >= 5 & age <= 9 ~ "5-9",
age >= 10 & age <= 14 ~ "10-14",
age >= 15 & age <= 17 ~ "15-17",
age >= 18 & age <= 59 ~ "18-59",
age >= 60 ~ "60+"
), levels = c("0-4","5-9","10-14","15-17","18-59","60+")
), label = "Age groups (7groups)"
), # Age group
agec4 =
structure(
factor(
case_when(
age >= 0 & age <= 9 ~ "0-9",
age >= 10 & age <= 17 ~ "10-17",
age >= 18 & age <= 58 ~ "18-59",
age >= 60 ~ "60+"
), levels = c("0-9","10-17","18-59","60+")
), label = "Age groups (4groups)"
), # Age group
member = 1
) %>%
group_by(hh_id) %>%
mutate(
hhsize = structure(sum(member), label = "Household size")
) %>%
ungroup() %>%
tabyl(hhsize, show_na = T) %>%
gt()| hhsize | n | percent |
|---|---|---|
| 1 | 613 | 0.0155662773 |
| 2 | 2872 | 0.0729304215 |
| 3 | 5892 | 0.1496190960 |
| 4 | 11296 | 0.2868461148 |
| 5 | 7860 | 0.1995937024 |
| 6 | 5220 | 0.1325545962 |
| 7 | 2485 | 0.0631030980 |
| 8 | 1288 | 0.0327069578 |
| 9 | 765 | 0.0194261046 |
| 10 | 550 | 0.0139664804 |
| 11 | 231 | 0.0058659218 |
| 12 | 96 | 0.0024377857 |
| 13 | 104 | 0.0026409345 |
| 14 | 14 | 0.0003555104 |
| 15 | 30 | 0.0007618080 |
| 16 | 64 | 0.0016251905 |
Note that for varible agec7, there are only 6 groups but the label in STATA code says 7 groups. Following the STATA code.
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible"),
no_adults_eligible = structure(NA,
label = "Household has no eligible women and men"),
no_child_fem_eligible = structure(NA,
label = "Household has no children or women eligible"),
no_eligibles = structure(NA,
label = "Household has no eligible women, men or children"),
no_hem_eligible = structure(NA, label = "Household has no eligible individuals for hemoglobin measurement")
) %>%
select(-c("fem_eligible", "hh_n_fem_eligible")) %>%
mutate(
weight = structure(hhweight.x, label = "Sample Weight"), # Sample Weight
area = structure(ifelse(hh6.x == 2,0,hh6.x),
label = "Area: urban-rural",
labels = c("rural","urban")), #Area: urban or rural||factors in
#R start with 1, to maintain #consistency with stata code #creating these integers as #numerics with levels
sex = structure(hl4,label = "Sex of Household member"), # Sex oh HH member
age = structure(ifelse(hl6>=98,NA,hl6),
label = "Age of household member"), #Age of HH member
agec7 =
structure(
factor(
case_when(
age >= 0 & age <= 4 ~ "0-4",
age >= 5 & age <= 9 ~ "5-9",
age >= 10 & age <= 14 ~ "10-14",
age >= 15 & age <= 17 ~ "15-17",
age >= 18 & age <= 59 ~ "18-59",
age >= 60 ~ "60+"
), levels = c("0-4","5-9","10-14","15-17","18-59","60+")
), label = "Age groups (7groups)"
), # Age group
agec4 =
structure(
factor(
case_when(
age >= 0 & age <= 9 ~ "0-9",
age >= 10 & age <= 17 ~ "10-17",
age >= 18 & age <= 58 ~ "18-59",
age >= 60 ~ "60+"
), levels = c("0-9","10-17","18-59","60+")
), label = "Age groups (4groups)"
), # Age group
member = 1
) %>%
group_by(hh_id) %>%
mutate(
hhsize = structure(sum(member), label = "Household size")
) %>%
ungroup() %>%
select(-c("member")) %>%
mutate(
region = structure(forcats::fct_relevel(
fct_recode(as_factor(hh7.x),
"North Central & Central Coastal Area" =
"North Central and Central Coastal area",
"Northern Midlands & Mountain Area" =
"Northern Midlands and Mountain area"),
c("Central Highlands",
"Mekong River Delta",
"North Central & Central Coastal Area",
"Northern Midlands & Mountain Area",
"Red River Delta",
"South East")),
label = "Region for Subnational Decomposition")
) %>%
tabyl(hh7.x, region) %>%
gt()| hh7.x | Central Highlands | Mekong River Delta | North Central & Central Coastal Area | Northern Midlands & Mountain Area | Red River Delta | South East |
|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 | 6198 | 0 |
| 2 | 0 | 0 | 0 | 6736 | 0 | 0 |
| 3 | 0 | 0 | 6134 | 0 | 0 | 0 |
| 4 | 7179 | 0 | 0 | 0 | 0 | 0 |
| 5 | 0 | 0 | 0 | 0 | 0 | 6748 |
| 6 | 0 | 6385 | 0 | 0 | 0 | 0 |
Notice how leveling of factors is significantly different in R. The {Forcats} is a very useful tool.
4.2 Step 2: Data preparation
Standardization of the 10 Global MPI Indicatiors.
Identification if non-deprived and deprived individuals.
4.2.1 Step 2.1: Years of Schooling
Comments from the STATA Code:
Note: In Viet Nam, children enter primary school aged 6 years, enter lower secondary school at 11 and upper secondary school at 15. There are grades in primary school (Grades 1 to 5), four in lower secondary school (Grades 6 to 9) and three in upper secondary school (Grades 10 to 12).(pg 192 report)
Let us first save the data frame created in the environment. This way we reduce the length of the code and copy pasting it again and again.
merged_dat %>%
mutate(
fem_eligible = ifelse(hl7 > 0 & !is.na(hl7), TRUE, NA), ## if hl7 is greater than 0, it cannot be NA, but I will stick to the STATA code for now
) %>%
group_by(hh_id) %>%
mutate(
hh_n_fem_eligible = sum(fem_eligible, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_fem_eligible = structure(hh_n_fem_eligible == 0, label = "Household has no eligible women"),
no_male_eligible = structure(NA, label = "Household has no eligible man"),
no_child_eligible = structure(NA, label = "Household has no children eligible"),
no_adults_eligible = structure(NA,
label = "Household has no eligible women and men"),
no_child_fem_eligible = structure(NA,
label = "Household has no children or women eligible"),
no_eligibles = structure(NA,
label = "Household has no eligible women, men or children"),
no_hem_eligible = structure(NA, label = "Household has no eligible individuals for hemoglobin measurement")
) %>%
select(-c("fem_eligible", "hh_n_fem_eligible")) %>%
mutate(
weight = structure(hhweight.x, label = "Sample Weight"), # Sample Weight
area = structure(ifelse(hh6.x == 2,0,hh6.x),
label = "Area: urban-rural",
labels = c("rural","urban")), #Area: urban or rural||factors in
#R start with 1, to maintain #consistency with stata code #creating these integers as #numerics with levels
sex = structure(hl4,label = "Sex of Household member"), # Sex oh HH member
age = structure(ifelse(hl6>=98,NA,hl6),
label = "Age of household member"), #Age of HH member
agec7 =
structure(
factor(
case_when(
age >= 0 & age <= 4 ~ "0-4",
age >= 5 & age <= 9 ~ "5-9",
age >= 10 & age <= 14 ~ "10-14",
age >= 15 & age <= 17 ~ "15-17",
age >= 18 & age <= 59 ~ "18-59",
age >= 60 ~ "60+"
), levels = c("0-4","5-9","10-14","15-17","18-59","60+")
), label = "Age groups (7groups)"
), # Age group
agec4 =
structure(
factor(
case_when(
age >= 0 & age <= 9 ~ "0-9",
age >= 10 & age <= 17 ~ "10-17",
age >= 18 & age <= 58 ~ "18-59",
age >= 60 ~ "60+"
), levels = c("0-9","10-17","18-59","60+")
), label = "Age groups (4groups)"
), # Age group
member = 1
) %>%
group_by(hh_id) %>%
mutate(
hhsize = structure(sum(member), label = "Household size")
) %>%
ungroup() %>%
select(-c("member")) %>%
mutate(
region = structure(forcats::fct_relevel(
fct_recode(as_factor(hh7.x),
"North Central & Central Coastal Area" =
"North Central and Central Coastal area",
"Northern Midlands & Mountain Area" =
"Northern Midlands and Mountain area"),
c("Central Highlands",
"Mekong River Delta",
"North Central & Central Coastal Area",
"Northern Midlands & Mountain Area",
"Red River Delta",
"South East")),
label = "Region for Subnational Decomposition")
) -> merged_datTable ed4b ed4a
| ed4b | 0 | 1 | 2 | 3 | 4 | 5 | 8 | NA_ |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 799 | 984 | 715 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1062 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1590 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 1848 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 1762 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 0 | 2145 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 0 | 0 | 1802 | 0 | 0 | 0 | 0 | 0 |
| 7 | 0 | 0 | 2270 | 0 | 0 | 0 | 0 | 0 |
| 8 | 0 | 0 | 1741 | 0 | 0 | 0 | 0 | 0 |
| 9 | 0 | 0 | 5350 | 0 | 0 | 0 | 0 | 0 |
| 10 | 0 | 0 | 0 | 1225 | 0 | 0 | 0 | 0 |
| 11 | 0 | 0 | 0 | 1008 | 0 | 0 | 0 | 0 |
| 12 | 0 | 0 | 0 | 3533 | 0 | 0 | 0 | 0 |
| 97 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 |
| 98 | 0 | 2 | 2 | 4 | 0 | 0 | 8 | 0 |
| 99 | 0 | 2 | 1 | 4 | 0 | 0 | 8 | 0 |
| NA | 734 | 0 | 0 | 0 | 1553 | 3677 | 0 | 5547 |
Table age ed6a when ed5 is 1
| age | 0 | 1 | 2 | 3 | 4 | 5 | 9 |
|---|---|---|---|---|---|---|---|
| 5 | 623 | 5 | 0 | 0 | 0 | 0 | 1 |
| 6 | 71 | 624 | 0 | 0 | 0 | 0 | 0 |
| 7 | 3 | 658 | 1 | 0 | 0 | 0 | 0 |
| 8 | 1 | 664 | 0 | 0 | 0 | 0 | 3 |
| 9 | 0 | 661 | 0 | 0 | 0 | 0 | 1 |
| 10 | 0 | 664 | 5 | 0 | 0 | 0 | 1 |
| 11 | 1 | 115 | 481 | 0 | 0 | 0 | 0 |
| 12 | 0 | 32 | 550 | 0 | 0 | 0 | 1 |
| 13 | 0 | 8 | 634 | 0 | 0 | 0 | 0 |
| 14 | 0 | 8 | 527 | 4 | 0 | 0 | 0 |
| 15 | 0 | 2 | 128 | 316 | 1 | 0 | 0 |
| 16 | 0 | 1 | 24 | 457 | 1 | 0 | 1 |
| 17 | 0 | 1 | 8 | 462 | 3 | 1 | 0 |
| 18 | 0 | 0 | 1 | 102 | 22 | 133 | 0 |
| 19 | 0 | 0 | 0 | 24 | 32 | 145 | 1 |
| 20 | 0 | 0 | 1 | 5 | 42 | 152 | 0 |
| 21 | 0 | 0 | 0 | 3 | 15 | 134 | 0 |
| 22 | 0 | 0 | 0 | 1 | 7 | 102 | 0 |
| 23 | 0 | 0 | 0 | 0 | 14 | 58 | 0 |
| 24 | 0 | 0 | 0 | 0 | 4 | 26 | 1 |
Creating edulevel and table ed4a, ed3
merged_dat %>%
mutate(
edulevel = ifelse(ed4a >= 8, NA, ed4a),
edulevel = ifelse((is.na(ed3) | ed3 == 2),0,edulevel)
) %>%
tabyl(ed4a, ed3, show_na = T) %>%
gt() | ed4a | 1 | 2 | NA_ |
|---|---|---|---|
| 0 | 734 | 0 | 0 |
| 1 | 9210 | 0 | 0 |
| 2 | 12151 | 0 | 0 |
| 3 | 6491 | 0 | 0 |
| 4 | 1553 | 0 | 0 |
| 5 | 3677 | 0 | 0 |
| 8 | 17 | 0 | 0 |
| NA | 0 | 2201 | 3346 |
Creating eduhighyear and table ed4b, ed43
merged_dat %>%
mutate(
edulevel = ifelse(ed4a >= 8, NA, ed4a),
edulevel = ifelse((is.na(ed3) | ed3 == 2),0,edulevel),
eduhighyear = ifelse((is.na(ed4b)|ed4b == 97| ed4b == 98| ed4b == 99),
NA, ed4b),
eduhighyear = ifelse((is.na(ed3) | ed3 == 2),0,eduhighyear)
) %>%
tabyl(ed4b, ed3, show_na = T) %>%
gt()| ed4b | 1 | 2 | NA_ |
|---|---|---|---|
| 0 | 2499 | 0 | 0 |
| 1 | 1062 | 0 | 0 |
| 2 | 1590 | 0 | 0 |
| 3 | 1848 | 0 | 0 |
| 4 | 1762 | 0 | 0 |
| 5 | 2145 | 0 | 0 |
| 6 | 1802 | 0 | 0 |
| 7 | 2270 | 0 | 0 |
| 8 | 1741 | 0 | 0 |
| 9 | 5350 | 0 | 0 |
| 10 | 1225 | 0 | 0 |
| 11 | 1008 | 0 | 0 |
| 12 | 3533 | 0 | 0 |
| 97 | 3 | 0 | 0 |
| 98 | 16 | 0 | 0 |
| 99 | 15 | 0 | 0 |
| NA | 5964 | 2201 | 3346 |
Now, I feel it is as good a time as any, since we have moved this forward in to mutating new variables, to introduce the {magrittr} pipes. Up till now in the document you must have seen %>% appear everywhere. This is a pipe operator. What it does is to allow us to write code where the format is similar to f(g(y(x))), where f,g, and y are function, and x is input for y and y(x) is input of g and so on.
One must have noticed that if I am mutating multiple varibables of a data frame I ahve to reassign the mutated data frame with -> or <- operator. This can be redundant at times. Consider the following example.
let us say I want a new variable c = a*b in the same data frame ex_pipe and keep rows where b is 11. Below is code to do it with using pipe operator.
ex_pipe %>%
mutate(
c= a * b # creating new variable
) %>% # using pipe to feed the output of mutate in to filter function
filter(
b == 11
) -> ex_pipe # updating the ex_pipe object/ data frame by reassigning.
ex_pipe## # A tibble: 3 x 3
## a b c
## <dbl> <dbl> <dbl>
## 1 1 11 11
## 2 3 11 33
## 3 4 11 44
I might not want to use -> operator as it can get redundant. There is a cousin of the %>% operator that helps us tackle this problem. For instance, say i want ot add 1 to c in each row and update the data prame ex_pipe.
ex_pipe %<>% ## Notice the difference
mutate(
c = c+1
) ##notice no need for arrow opration and reassignment
ex_pipe## # A tibble: 3 x 3
## a b c
## <dbl> <dbl> <dbl>
## 1 1 11 12
## 2 3 11 34
## 3 4 11 45
We will be using %<>% where necessary.
Ok. Back on track now.
updating merged_dat.
merged_dat %<>%
mutate(
edulevel = ifelse(ed4a >= 8, NA, ed4a),
edulevel = ifelse((is.na(ed3) | ed3 == 2),0,edulevel),
eduhighyear = ifelse((is.na(ed4b)|ed4b == 97| ed4b == 98| ed4b == 99),
NA, ed4b),
eduhighyear = structure(ifelse((is.na(ed3) | ed3 == 2),0,eduhighyear),
label = "HIghest year of education completed")
) Table eduhighyear
| eduhighyear | n | percent | valid_percent |
|---|---|---|---|
| 0 | 8046 | 0.20431691 | 0.24102810 |
| 1 | 1062 | 0.02696800 | 0.03181355 |
| 2 | 1590 | 0.04037583 | 0.04763046 |
| 3 | 1848 | 0.04692737 | 0.05535918 |
| 4 | 1762 | 0.04474352 | 0.05278294 |
| 5 | 2145 | 0.05446927 | 0.06425619 |
| 6 | 1802 | 0.04575927 | 0.05398119 |
| 7 | 2270 | 0.05764347 | 0.06800072 |
| 8 | 1741 | 0.04421026 | 0.05215386 |
| 9 | 5350 | 0.13585576 | 0.16026601 |
| 10 | 1225 | 0.03110716 | 0.03669642 |
| 11 | 1008 | 0.02559675 | 0.03019591 |
| 12 | 3533 | 0.08971559 | 0.10583548 |
| NA | 5998 | 0.15231082 | NA |
Cleaning inconsistencies.
merged_dat %<>%
mutate(
eduhighyear = ifelse(age < 10, 0, eduhighyear ),
eduhighyear = ifelse((edulevel == 1 & eduhighyear > 5), NA, eduhighyear ),
eduhighyear = ifelse((edulevel == 2 & eduhighyear > 9), NA, eduhighyear ),
eduhighyear = ifelse((edulevel == 3 & eduhighyear > 12), NA, eduhighyear ),
eduhighyear = ifelse(edulevel == 0, 0, eduhighyear ),
)Creating Years of schooling.
merged_dat %<>%
mutate(
eduyears = ifelse((edulevel == 1 & is.na(eduhighyear)),0, eduhighyear),
eduyears = ifelse((edulevel == 2 & (is.na(eduhighyear)|eduhighyear ==0)),
5,eduyears),
eduyears = ifelse((edulevel == 3 & (is.na(eduhighyear)|eduhighyear ==0)),
9,eduyears),
eduyears = ifelse((edulevel == 4 & (is.na(eduhighyear)|eduhighyear ==0)),
12,eduyears), # the equivalent STATA code fot this is written twice in the do file.
eduyears = ifelse(((edulevel == 4 | edulevel ==5) &
(is.na(eduhighyear)|eduhighyear ==0)),
13,eduyears),
eduyears = ifelse(edulevel == 0 ,0, eduyears),
eduyears = ifelse(is.na(edulevel) ,NA, eduyears),
)Checking for further inconsistencies.
merged_dat %<>%
mutate(
eduyears = ifelse((age <= eduyears & age > 0), NA, eduyears),
eduyears = structure(ifelse(age < 10, 0, eduyears),
label = "Total number of years of education acomplished")
)
# Table eduyears edulevel
merged_dat %>%
tabyl(eduyears, edulevel, show_na = T) %>%
kableExtra::kable()| eduyears | 0 | 1 | 2 | 3 | 4 | 5 | NA_ |
|---|---|---|---|---|---|---|---|
| 0 | 6281 | 2707 | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 356 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 951 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 1298 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 1753 | 0 | 0 | 0 | 0 | 0 |
| 5 | 0 | 2145 | 987 | 0 | 0 | 0 | 0 |
| 6 | 0 | 0 | 1802 | 0 | 0 | 0 | 0 |
| 7 | 0 | 0 | 2270 | 0 | 0 | 0 | 0 |
| 8 | 0 | 0 | 1741 | 0 | 0 | 0 | 0 |
| 9 | 0 | 0 | 5350 | 725 | 0 | 0 | 0 |
| 10 | 0 | 0 | 0 | 1225 | 0 | 0 | 0 |
| 11 | 0 | 0 | 0 | 1008 | 0 | 0 | 0 |
| 12 | 0 | 0 | 0 | 3533 | 0 | 0 | 0 |
| 13 | 0 | 0 | 0 | 0 | 1553 | 3677 | 0 |
| NA | 0 | 0 | 0 | 0 | 0 | 0 | 17 |
Table ed4b age
| ed4b | 7 |
|---|---|
| 0 | 1 |
Creating Control variables regarding information on years of education.
merged_dat %<>%
mutate(
temp = ifelse((!is.na(eduyears) & age >= 10 & !is.na(age)), 1, NA)
) %>%
group_by(hh_id) %>%
mutate(
no_missing_edu = sum(temp, na.rm = T)
) %>%
ungroup() %>%
mutate(
temp2 = ifelse((age >= 10 & !is.na(age)),1,NA)
) %>%
group_by(hh_id) %>%
mutate(
hhs = sum(temp2, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_missing_edu = no_missing_edu/hhs,
no_missing_edu = structure(no_missing_edu >= (2/3),
label = "No missing edu for at least 2/3 of the HH members aged 10 years & older")
)
# Table for no_missing_edu
merged_dat %>%
tabyl(no_missing_edu, show_na = T) %>%
gt()| no_missing_edu | n | percent |
|---|---|---|
| FALSE | 8 | 0.0002031488 |
| TRUE | 39372 | 0.9997968512 |
Dropping temp, temp2 and hhs
4.2.1.1 Standard MPI
Comments from STATA code:
The entire household is considered deprived if no household member aged 10 years or older has completed SIX years of schooling.
merged_dat %<>%
mutate(
years_edu6 = ifelse(is.na(eduyears),NA,(eduyears >= 6)),
) %>%
group_by(hh_id) %>%
mutate(
hh_years_edu6_1 = max(years_edu6, na.rm = T)
) %>%
ungroup() %>%
mutate(
hh_years_edu6 = (hh_years_edu6_1 == 1),
hh_years_edu6 = ifelse(is.na(hh_years_edu6_1), NA, hh_years_edu6),
hh_years_edu6 = structure(
ifelse((hh_years_edu6 == 0 & no_missing_edu == 0),
NA, hh_years_edu6),
label = "Household has at least one member with 6 years of edu")
)
# Table hh_years_edu6
merged_dat %>%
tabyl(hh_years_edu6, show_na = T) %>%
gt()| hh_years_edu6 | n | percent |
|---|---|---|
| FALSE | 3439 | 0.08732859 |
| TRUE | 35941 | 0.91267141 |
4.2.2 Step 2.2: Child School Attendance
Creating the attendence variable.
merged_dat %<>%
mutate(
attendance = ifelse(ed5 == 1, 1, NA),
attendance = ifelse(ed5 == 2, 0, attendance),
attendance = ifelse(ed3 == 2, 0, attendance),
)
# Table age ed5
merged_dat %>%
tabyl(age, ed5, show_na = T) %>%
gt()| age | 1 | 2 | 9 | NA_ |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 732 |
| 1 | 0 | 0 | 0 | 787 |
| 2 | 0 | 0 | 0 | 634 |
| 3 | 0 | 0 | 0 | 553 |
| 4 | 0 | 0 | 0 | 640 |
| 5 | 629 | 4 | 1 | 34 |
| 6 | 695 | 3 | 0 | 12 |
| 7 | 662 | 1 | 0 | 9 |
| 8 | 668 | 6 | 0 | 9 |
| 9 | 662 | 3 | 0 | 6 |
| 10 | 670 | 5 | 0 | 9 |
| 11 | 597 | 12 | 0 | 9 |
| 12 | 583 | 14 | 0 | 7 |
| 13 | 642 | 45 | 0 | 11 |
| 14 | 539 | 61 | 0 | 14 |
| 15 | 447 | 103 | 1 | 7 |
| 16 | 484 | 159 | 0 | 11 |
| 17 | 475 | 202 | 3 | 18 |
| 18 | 258 | 327 | 4 | 9 |
| 19 | 202 | 339 | 3 | 18 |
| 20 | 200 | 363 | 2 | 18 |
| 21 | 152 | 369 | 7 | 21 |
| 22 | 110 | 460 | 5 | 20 |
| 23 | 72 | 540 | 13 | 22 |
| 24 | 31 | 518 | 17 | 20 |
| 25 | 0 | 0 | 0 | 595 |
| 26 | 0 | 0 | 0 | 605 |
| 27 | 0 | 0 | 0 | 528 |
| 28 | 0 | 0 | 0 | 618 |
| 29 | 0 | 0 | 0 | 604 |
| 30 | 0 | 0 | 0 | 651 |
| 31 | 0 | 0 | 0 | 634 |
| 32 | 0 | 0 | 0 | 575 |
| 33 | 0 | 0 | 0 | 569 |
| 34 | 0 | 0 | 0 | 574 |
| 35 | 0 | 0 | 0 | 540 |
| 36 | 0 | 0 | 0 | 574 |
| 37 | 0 | 0 | 0 | 600 |
| 38 | 0 | 0 | 0 | 550 |
| 39 | 0 | 0 | 0 | 580 |
| 40 | 0 | 0 | 0 | 552 |
| 41 | 0 | 0 | 0 | 583 |
| 42 | 0 | 0 | 0 | 576 |
| 43 | 0 | 0 | 0 | 570 |
| 44 | 0 | 0 | 0 | 579 |
| 45 | 0 | 0 | 0 | 612 |
| 46 | 0 | 0 | 0 | 498 |
| 47 | 0 | 0 | 0 | 514 |
| 48 | 0 | 0 | 0 | 542 |
| 49 | 0 | 0 | 0 | 571 |
| 50 | 0 | 0 | 0 | 567 |
| 51 | 0 | 0 | 0 | 566 |
| 52 | 0 | 0 | 0 | 449 |
| 53 | 0 | 0 | 0 | 561 |
| 54 | 0 | 0 | 0 | 508 |
| 55 | 0 | 0 | 0 | 485 |
| 56 | 0 | 0 | 0 | 441 |
| 57 | 0 | 0 | 0 | 383 |
| 58 | 0 | 0 | 0 | 341 |
| 59 | 0 | 0 | 0 | 370 |
| 60 | 0 | 0 | 0 | 298 |
| 61 | 0 | 0 | 0 | 321 |
| 62 | 0 | 0 | 0 | 222 |
| 63 | 0 | 0 | 0 | 272 |
| 64 | 0 | 0 | 0 | 237 |
| 65 | 0 | 0 | 0 | 225 |
| 66 | 0 | 0 | 0 | 167 |
| 67 | 0 | 0 | 0 | 172 |
| 68 | 0 | 0 | 0 | 132 |
| 69 | 0 | 0 | 0 | 143 |
| 70 | 0 | 0 | 0 | 150 |
| 71 | 0 | 0 | 0 | 155 |
| 72 | 0 | 0 | 0 | 123 |
| 73 | 0 | 0 | 0 | 154 |
| 74 | 0 | 0 | 0 | 125 |
| 75 | 0 | 0 | 0 | 131 |
| 76 | 0 | 0 | 0 | 125 |
| 77 | 0 | 0 | 0 | 105 |
| 78 | 0 | 0 | 0 | 110 |
| 79 | 0 | 0 | 0 | 92 |
| 80 | 0 | 0 | 0 | 110 |
| 81 | 0 | 0 | 0 | 104 |
| 82 | 0 | 0 | 0 | 78 |
| 83 | 0 | 0 | 0 | 113 |
| 84 | 0 | 0 | 0 | 105 |
| 85 | 0 | 0 | 0 | 62 |
| 86 | 0 | 0 | 0 | 51 |
| 87 | 0 | 0 | 0 | 47 |
| 88 | 0 | 0 | 0 | 32 |
| 89 | 0 | 0 | 0 | 22 |
| 90 | 0 | 0 | 0 | 27 |
| 91 | 0 | 0 | 0 | 21 |
| 92 | 0 | 0 | 0 | 23 |
| 93 | 0 | 0 | 0 | 20 |
| 94 | 0 | 0 | 0 | 14 |
| 95 | 0 | 0 | 0 | 29 |
merged_dat %<>%
mutate(
attendance = ifelse((age < 5 | age > 24),0,attendance)
)
# Table Attendance
merged_dat %>%
tabyl(attendance, show_na = T) %>%
gt()| attendance | n | percent | valid_percent |
|---|---|---|---|
| 0 | 30546 | 0.775672930 | 0.7767775 |
| 1 | 8778 | 0.222905028 | 0.2232225 |
| NA | 56 | 0.001422042 | NA |
4.2.2.1 Standard MPI
Comments from STATA code:
The entire household is considered deprived if any school-aged child is not attending school up to class 8.
merged_dat %<>%
mutate(
child_schoolage = (age >= 6 & age <= 14)
)
# count If child_schoolage ==1 & attendance ==.
merged_dat %>%
filter(child_schoolage == 1 & is.na(attendance))## # A tibble: 0 x 276
## # ... with 276 variables: hh1.x <dbl>, hh2.x <dbl>, hl1 <dbl>, hl3 <dbl+lbl>,
## # hl4 <dbl+lbl>, hl5m <dbl+lbl>, hl5y <dbl+lbl>, hl6 <dbl+lbl>,
## # hl7 <dbl+lbl>, hl7b <dbl+lbl>, hl11 <dbl+lbl>, hl12 <dbl+lbl>,
## # hl12a <dbl+lbl>, hl13 <dbl+lbl>, hl14 <dbl+lbl>, hl14a <dbl+lbl>,
## # hl15 <dbl>, ed1 <dbl>, ed3 <dbl+lbl>, ed4a <dbl+lbl>, ed4b <dbl+lbl>,
## # ed5 <dbl+lbl>, ed6a <dbl+lbl>, ed6b <dbl+lbl>, ed7 <dbl+lbl>,
## # ed8a <dbl+lbl>, ed8b <dbl+lbl>, mp3 <dbl>, mp5 <dbl+lbl>, mp6 <dbl+lbl>,
## # mp7 <dbl+lbl>, hh5d.x <dbl>, hh5m.x <dbl>, hh5y.x <dbl>, hh6.x <dbl+lbl>,
## # hh7.x <dbl+lbl>, mline <dbl+lbl>, fline <dbl+lbl>, ethnicity.x <dbl+lbl>,
## # helevel.x <dbl+lbl>, melevel <dbl+lbl>, felevel <dbl+lbl>,
## # schage <dbl+lbl>, hhweight.x <dbl>, wscore.x <dbl>, windex5.x <dbl+lbl>,
## # wscoreu.x <dbl>, windex5u.x <dbl+lbl>, wscorer.x <dbl>,
## # windex5r.x <dbl+lbl>, hh_id <dbl>, ind_id <dbl>, hh1.y <dbl>, hh2.y <dbl>,
## # ln <dbl>, bhln <dbl>, bh2 <dbl+lbl>, bh3 <dbl+lbl>, bh4m <dbl+lbl>,
## # bh4y <dbl+lbl>, bh5 <dbl+lbl>, bh6 <dbl+lbl>, bh7 <dbl+lbl>, bh8 <dbl+lbl>,
## # bh9u <dbl+lbl>, bh9n <dbl+lbl>, bh10 <dbl+lbl>, bh4c <dbl>, bh4f <dbl+lbl>,
## # bh9c <dbl>, bh9f <dbl+lbl>, hh6.y <dbl+lbl>, hh7.y <dbl+lbl>, wdoi <dbl>,
## # wdob <dbl>, ethnicity.y <dbl+lbl>, welevel <dbl+lbl>, brthord <dbl+lbl>,
## # magebrt <dbl+lbl>, birthint <dbl+lbl>, wmweight <dbl>, wscore.y <dbl>,
## # windex5.y <dbl+lbl>, wscoreu.y <dbl>, windex5u.y <dbl+lbl>,
## # wscorer.y <dbl>, windex5r.y <dbl+lbl>, windex2 <dbl>, date_death <dbl>,
## # mdead_survey <dbl>, ydead_survey <dbl>, age_death <dbl>, child_died <fct>,
## # tot_child_died <int>, child18_died <fct>, tot_child18_died_5y <int>,
## # childu18_dies_per_wom_5y <int>, wm7 <dbl+lbl>, cm1 <dbl+lbl>,
## # cm8 <dbl+lbl>, ...
merged_dat %<>%
mutate(
temp = ifelse((child_schoolage == 1 & (!is.na(attendance))),
1, NA)
) %>%
group_by(hh_id) %>%
mutate(
no_missing_atten = sum(temp, na.rm = T)
) %>%
ungroup() %>%
mutate(
temp2 = ifelse(child_schoolage == 1, 1, NA)
) %>%
group_by(hh_id) %>%
mutate(
hhs = sum(temp2, na.rm = T)
) %>%
ungroup() %>%
mutate(
no_missing_atten = ifelse((no_missing_atten == 0 & hhs == 0),
1,no_missing_atten/hhs), # deviation
no_missing_atten = structure((no_missing_atten >= (2/3)),
label = "No missing school attendance for at least 2/3 of the school aged children")
) %>%
select(-c("temp", "temp2", "hhs"))
# Table no_mossing_atten
merged_dat %>%
tabyl(no_missing_atten, show_na = T) %>%
gt()| no_missing_atten | n | percent |
|---|---|---|
| TRUE | 39380 | 1 |
Notice there is a deviation from STATA code : replace no_missing_aten = no_missing_atten/hhs. This is because when both variables are 0 NaNs are generated.
merged_dat %<>%
group_by(hh_id) %>%
mutate(
hh_children_schoolage = sum(child_schoolage, na.rm = TRUE)
) %>%
ungroup() %>%
mutate(
hh_children_schoolage = structure(
hh_children_schoolage > 0,
label = "Household has children in schoolage"
),
child_not_atten = ifelse(child_schoolage == 1, attendance == 0, NA),
child_not_atten = ifelse((is.na(attendance) & child_schoolage == 1),
NA, child_not_atten)
) %>%
group_by(hh_id) %>%
mutate(
any_child_not_atten = max(child_not_atten, na.rm = T)
) %>%
ungroup() %>%
mutate(
hh_child_atten = (any_child_not_atten == 0),
hh_child_atten = ifelse(is.na(any_child_not_atten),
NA, hh_child_atten),
hh_child_atten = ifelse(hh_children_schoolage ==0,
1, hh_child_atten),
hh_child_atten = structure(
ifelse((hh_child_atten == 1 & no_missing_atten == 0),
NA, hh_child_atten),
label = "Household has all school age children up to class 8 in school"
)
)
# Table hh_child_Atten
merged_dat %>%
tabyl(hh_child_atten, show_na = T) %>%
gt()| hh_child_atten | n | percent |
|---|---|---|
| 0 | 1186 | 0.03011681 |
| 1 | 38194 | 0.96988319 |
4.2.3 Step 2.3: Nutrition
Comments from STATA code:
Note: Anthropometric data was not collected for children under 5 as part of the Viet Nam MICS 2014 dataset.
4.2.4 Step 2.4: Chlid Mortality
merged_dat %<>%
mutate(
temp_f = case_when(
is.na(cm9a) & !is.na(cm9b) ~ cm9b,
!is.na(cm9a) & is.na(cm9b) ~ cm9a,
!is.na(cm9a) & !is.na(cm9b) ~ cm9b + cm9b,
),
temp_f = ifelse(
(cm1 == 1 & cm8 ==2) | cm1 == 2,
0, temp_f
),
temp_f = ifelse(no_fem_eligible == 1,0, temp_f)
) %>%
group_by(hh_id) %>%
mutate(
child_mortality_f = structure(
ifelse(
length(temp_f) == sum(is.na(temp_f)),
NA,
sum(temp_f, na.rm = T)
),
label = "Occurence of child mortality as reported by women")
) %>%
ungroup() %>%
select(-c("temp_f"))
# Table child_mortality_f
merged_dat %>%
tabyl(child_mortality_f, show_na = T) %>%
gt()| child_mortality_f | n | percent | valid_percent |
|---|---|---|---|
| 0 | 37791 | 0.9596495683 | 0.974371535 |
| 2 | 929 | 0.0235906552 | 0.023952559 |
| 4 | 59 | 0.0014982224 | 0.001521207 |
| 6 | 6 | 0.0001523616 | 0.000154699 |
| NA | 595 | 0.0151091925 | NA |
merged_dat %<>%
mutate(
child_mortality_m = structure(NA,
label = "Occurence of child mortality reported by men"),
child_mortality = structure(
child_mortality_f,
label = "Total child mortality reported by men and women")
)
# table child mortality
merged_dat %>%
tabyl(child_mortality, show_na = T)## child_mortality n percent valid_percent
## 0 37791 0.9596495683 0.974371535
## 2 929 0.0235906552 0.023952559
## 4 59 0.0014982224 0.001521207
## 6 6 0.0001523616 0.000154699
## NA 595 0.0151091925 NA
4.2.4.1 Standard MPI
Comments from STATA code:
The standard MPI indicator takes a value of “0” if women in the household reported mortality among children under 18 in the last 5 years from the survey year. The indicator takes a value of “1” if eligible women within the household reported
(i) no child mortality or
(ii) if any child died longer than 5 years from the survey year or
(iii) if any child 18 years and older died in the last 5 years. Households were replaced with a value of “1” if eligible men within the household reported no child mortality in the absence of information from women. The indicator takes a missing value if there was missing information on reported death from eligible individuals.
Table for childu18_died_5y.
comments form the STATA code:
The ‘childu18_died_per_wom_5y’ variable was constructed in Step 1.2 using information from individual women who ever gave birth in the BH file. The missing values represent eligible woman who have never ever given birth and so are not present in the BR file. But these ‘missing women’ may be living in households where there are other women with child mortality information from the BH file. So at this stage, it is important that we aggregate the information that was obtained from the BH file at the household level. This ensures that women who were not present in the BH file is assigned with a value, following the information provided by other women in the household.
| childu18_dies_per_wom_5y | n | percent | valid_percent |
|---|---|---|---|
| 0 | 7092 | 0.1800914170 | 0.998732573 |
| 1 | 9 | 0.0002285424 | 0.001267427 |
| NA | 32279 | 0.8196800406 | NA |
Assign 0 for wome who gave birth but did not report death
merged_dat %<>%
mutate(
childu18_dies_per_wom_5y = ifelse((cm1 == 1&cm8==2)|cm1 == 2,
0,childu18_dies_per_wom_5y)
) %>%
group_by(hh_id) %>%
mutate(
childu18_mortality_5y = sum(childu18_dies_per_wom_5y, na.rm = T)
) %>%
ungroup() %>%
mutate(
childu18_mortality_5y = structure(
ifelse((is.na(childu18_mortality_5y) & child_mortality == 0),
0, childu18_mortality_5y),
label = "Under 18 child mortality within household past 5 years reported by wome"
)
)
# table
merged_dat %>%
tabyl(childu18_mortality_5y, show_na = T) %>%
gt()| childu18_mortality_5y | n | percent |
|---|---|---|
| 0 | 39336 | 0.998882682 |
| 1 | 44 | 0.001117318 |
merged_dat %<>%
mutate(
hh_mortality_u18_5y = (childu18_mortality_5y == 0),
hh_mortality_u18_5y = structure(
ifelse(is.na(childu18_mortality_5y),NA, hh_mortality_u18_5y),
label = "Household had no under 18 child mortality
in the last 5 years"
)
)
# table hh_mortality_u18_5y
merged_dat %>%
tabyl(hh_mortality_u18_5y, show_na = T) %>%
gt()| hh_mortality_u18_5y | n | percent |
|---|---|---|
| FALSE | 44 | 0.001117318 |
| TRUE | 39336 | 0.998882682 |
4.2.5 Step 2.5: Electricity
Comments from STATA:
Members of the household are considered deprived if the household has no electricity
4.2.6 Step 2.6: Sanitation
Comments from STATA Code:
Improved sanitation facilities include flush or pour flush toilets to sewer systems, septic tanks or pit latrines, ventilated improved pit latrines, pit latrines with a slab, and composting toilets. These facilities are only considered improved if it is private, that is, it is not shared with other households. Source: https://unstats.un.org/sdgs/metadata/files/Metadata-06-02-01.pdf Note: In cases of mismatch between the country report and the internationally agreed guideline, we followed the report.
Creating new variables
merged_dat %<>%
mutate(
toilet = ws8,
shared_toilet = ws9,
shared_toilet = structure(
ifelse(shared_toilet == 2,0,shared_toilet),
labels = c("Yes" =1, "No" =0)
),
shared_toilet = structure(
ifelse(shared_toilet ==3, NA, shared_toilet),
labels = c("Yes" =1, "No" =0))
)4.2.6.1 Standard MPI
Comments from STATA code:
Members of the household are considered deprived if the household’s sanitation facility is not improved (according to the SDG guideline) or it is improved but shared with other households
merged_dat %<>%
mutate(
toilet_mdg = ((toilet <= 22 | toilet == 31) & shared_toilet != 1),
toilet_mdg = ifelse(
((toilet <= 22 | toilet == 31) & shared_toilet == 1),
FALSE, toilet_mdg),
toilet_mdg = structure(
ifelse((is.na(toilet)|toilet ==99), NA, toilet_mdg),
label = "Household has improved sanitation with MDG Standards"
)
)
# table toilet_mdg
merged_dat %>%
tabyl(toilet_mdg, show_na = T) %>%
gt()| toilet_mdg | n | percent |
|---|---|---|
| FALSE | 8687 | 0.2205942 |
| TRUE | 30693 | 0.7794058 |
4.2.7 Step 2.7: Drinking Water
Comments from the STATA Code:
Improved drinking water sources include the following: piped water into dwelling, yard or plot; public taps or standpipes; boreholes or tubewells; protected dug wells; protected springs; packaged water; delivered water and rainwater which is located on premises or is less than a 30-minute walk from home roundtrip. Source: https://unstats.un.org/sdgs/metadata/files/Metadata-06-01-01.pdf Note: In cases of mismatch between the country report and the internationally agreed guideline, we followed the report.
merged_dat %<>%
mutate(
water = ws1,
timetowater = ws4,
ndwater = ws2
)
# table ws2 if water is 91
merged_dat %>%
filter(water == 91) %>%
tabyl(ws2, show_na = T) %>%
gt()| ws2 | n | percent |
|---|---|---|
| 11 | 2940 | 0.4577300327 |
| 12 | 126 | 0.0196170014 |
| 13 | 55 | 0.0085629768 |
| 14 | 4 | 0.0006227619 |
| 21 | 2076 | 0.3232134517 |
| 31 | 651 | 0.1013545072 |
| 32 | 79 | 0.0122995485 |
| 41 | 80 | 0.0124552390 |
| 42 | 4 | 0.0006227619 |
| 51 | 110 | 0.0171259536 |
| 81 | 244 | 0.0379884789 |
| 96 | 54 | 0.0084072863 |
comments from STATA code:
Because the quality of bottled water is not known, households using bottled water for drinking are classified as using an improved or unimproved source according to their water source for non-drinking activities such as cooking and hand washing. However, it is important to note that households using bottled water for drinking are classified as unimproved source if this is explicitly mentioned in the country report.
4.2.7.1 Standard MPI
Comments from STATA code:
Members of the household are considered deprived if the household does not have access to improved drinking water (according to the SDG guideline) or safe drinking water is at least a 30-minute walk from home, roundtrip
merged_dat %<>%
mutate(
water_mdg = ifelse(water %in% c(11,12,14,21,13,31,41,51,91),1,NA),
water_mdg = ifelse(water %in% c(32,42,81,96),0, water_mdg),
water_mdg = ifelse(
(water_mdg == 1 & timetowater >= 30 & !is.na(timetowater) &
timetowater != 999),
0, water_mdg),
water_mdg = ifelse(water == 99 | is.na(water), NA, water_mdg),
water_mdg = structure(
ifelse(water == 91 & (ndwater %in% c(32,42,81,96)),0,water_mdg),
label = "Household has drinking water with MDG standards (considering distance)"
)
)
# table water_mdg
merged_dat %>%
tabyl(water_mdg, show_na = T) %>%
gt()| water_mdg | n | percent |
|---|---|---|
| 0 | 3356 | 0.08522092 |
| 1 | 36024 | 0.91477908 |
4.2.8 Step 2.8: Housing
Comments from STATA code:
Members of the household are considered deprived if the household has a dirt, sand or dung floor
merged_dat %<>%
mutate(
floor = hc3,
floor_imp = 1,
floor_imp = ifelse(floor == 11 | floor == 96,0, floor_imp),
floor_imp = structure(
ifelse(is.na(floor)|floor == 99, NA, floor_imp),
label = "Household has floor that it is not earth/sand/dung"
)
)
# Table floor_imp
merged_dat %>%
tabyl(floor, floor_imp, show_na = T) %>%
kableExtra::kable()| floor | 0 | 1 |
|---|---|---|
| 11 | 1664 | 0 |
| 21 | 0 | 836 |
| 22 | 0 | 541 |
| 31 | 0 | 929 |
| 32 | 0 | 17 |
| 33 | 0 | 2509 |
| 34 | 0 | 7332 |
| 35 | 0 | 29 |
| 36 | 0 | 25523 |
Comments from STATA:
Members of the household are considered deprived if the household has wall made of natural or rudimentary materials
merged_dat %<>%
mutate(
wall = hc5,
wall_imp = 1,
wall_imp = ifelse(wall <= 26|wall == 96,0, wall_imp),
wall_imp = structure(
ifelse(is.na(wall), NA, wall_imp),
label = "Household has wall that it is not of low quality materials"
)
)
# table wall and wall_imp
merged_dat %>%
tabyl(wall, wall_imp, show_na = T) %>%
kableExtra::kable()| wall | 0 | 1 |
|---|---|---|
| 11 | 42 | 0 |
| 12 | 1329 | 0 |
| 13 | 56 | 0 |
| 14 | 21 | 0 |
| 21 | 424 | 0 |
| 22 | 132 | 0 |
| 23 | 40 | 0 |
| 24 | 342 | 0 |
| 25 | 28 | 0 |
| 26 | 282 | 0 |
| 31 | 0 | 21 |
| 32 | 0 | 78 |
| 33 | 0 | 30817 |
| 34 | 0 | 1152 |
| 35 | 0 | 154 |
| 36 | 0 | 3309 |
| 96 | 1153 | 0 |
Comments from STATA code:
Members of the household are considered deprived if the household has roof made of natural or rudimentary materials
merged_dat %<>%
mutate(
roof = hc4,
roof_imp = 1,
roof_imp = ifelse(roof <= 23|roof == 96,0, roof_imp),
roof_imp = structure(
ifelse(is.na(roof), NA, roof_imp),
label = "Household has roof that it is not of low quality materials"
)
)
# table roof and roof_imp
merged_dat %>%
tabyl(roof, roof_imp, show_na = T) %>%
kableExtra::kable()| roof | 0 | 1 |
|---|---|---|
| 12 | 749 | 0 |
| 22 | 180 | 0 |
| 23 | 58 | 0 |
| 31 | 0 | 17452 |
| 33 | 0 | 3708 |
| 34 | 0 | 7779 |
| 35 | 0 | 9421 |
| 37 | 0 | 30 |
| 38 | 0 | 3 |
4.2.8.1 Standard MPI
Comments from the STATA Code:
merged_dat %<>%
mutate(
housing_1 = ifelse(
(floor_imp == 0| wall_imp == 0| roof_imp == 0),0,1
),
housing_1 = structure(
ifelse(
(is.na(floor_imp) & is.na(wall_imp) & is.na(roof_imp)),
NA,housing_1
),
label = "Household has roof, floor & walls that it is not low quality material"
)
)
# Table housing_1
merged_dat %>%
tabyl(housing_1, show_na = T) %>%
gt()| housing_1 | n | percent |
|---|---|---|
| 0 | 4793 | 0.1217115 |
| 1 | 34587 | 0.8782885 |
4.2.9 Step 2.9: Cooking Fuel
Comments from STATA code:
_Solid fuel are solid materials burned as fuels, which includes coal as well as solid biomass fuels (wood, animal dung, crop wastes and charcoal). Source: https://apps.who.int/iris/bitstream/handle/10665/141496/9789241548885_eng.pdf_
merged_dat %<>%
mutate(
cookingfuel = hc6,
cooking_mdg = 1,
cooking_mdg = ifelse(cookingfuel > 5 & cookingfuel < 95,
0,cooking_mdg),
cooking_mdg = structure(
ifelse(is.na(cookingfuel) | cookingfuel == 99,NA,cooking_mdg),
label = "Household has cooking fuel according to MDG standards"
)
)
# table cooking fuel and cooking coking_mdg
merged_dat %>%
tabyl(cookingfuel, cooking_mdg, show_na = T) %>%
kableExtra::kable()| cookingfuel | 0 | 1 |
|---|---|---|
| 1 | 0 | 628 |
| 2 | 0 | 21723 |
| 3 | 0 | 75 |
| 4 | 0 | 441 |
| 5 | 0 | 82 |
| 6 | 620 | 0 |
| 7 | 938 | 0 |
| 8 | 14326 | 0 |
| 9 | 358 | 0 |
| 11 | 129 | 0 |
| 95 | 0 | 60 |
4.2.10 Step 2.10: Assets Ownership
Comments from STATA code:
Assets that are included in the global MPI: Radio, TV, telephone, bicycle, motorbike, refrigerator, car, computer and animal cart
Creating new variables
merged_dat %<>%
mutate(
television = structure(
ifelse(hc8c == 1,1,
ifelse(hc8c == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
bw_television = NA,
radio = structure(
ifelse(hc8b == 1,1,
ifelse(hc8b == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
telephone = structure(
ifelse(hc8d == 1,1,
ifelse(hc8d == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
mobiletelephone = structure(
ifelse(hc9b == 1,1,
ifelse(hc9b == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
refrigerator = structure(
ifelse(hc8e == 1,1,
ifelse(hc8e == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
car = structure(
ifelse(hc8p == 1,1,
ifelse(hc8p == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
bicycle = structure(
ifelse(hc9c == 1,1,
ifelse(hc9c == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
motorbike = structure(
ifelse(hc9d == 1,1,
ifelse(hc9d == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
computer = structure(
ifelse(hc8j == 1,1,
ifelse(hc8j == 2,0,NA)),
labels = c("Yes" = 1, "No" = 0)
),
animal_cart = NA
)Grouping telephone and mobiletelephone as a single variable
merged_dat %<>%
mutate(
telephone = ifelse(
(telephone == 0 & mobiletelephone == 1) |
(is.na(telephone) & mobiletelephone == 1),
1, telephone
)
)4.2.10.1 Standarad MPI
Comments from STATA code:
Members of the household are considered deprived in assets if the household does not own more than one of: radio, TV, telephone, bike, motorbike, refrigerator, computer or animal cart and does not own a car or truck.
merged_dat %<>%
mutate(
n_small_assets2 = structure(
ifelse(
(is.na(television)&is.na(radio)&is.na(telephone)&
is.na(refrigerator)&is.na(bicycle)&is.na(motorbike)&
is.na(computer)&is.na(animal_cart)),
NA,rowSums(.[,c(305,307,308,310,312:315)],na.rm = T)
),
label = "Household number of small assets owned"),
hh_assets2 = (car ==1 | n_small_assets2 > 1),
hh_assets2 = structure(
ifelse(is.na(car) & is.na(n_small_assets2),NA,hh_assets2),
label = "Household Asset Ownership: HH has car or more than 1 small assets incl computer & animal cart"
)
) 4.2.11 Rename and keep variables for MPI calculations
Retain data on smapling design
Retain year, month and date of interview
merged_dat %<>%
mutate(
year_interview = hh5y.x,
month_interview = hh5m.x,
date_interview = hh5d.x
)Generate persence of subsample
Rename key global MPI indicators for estimation
merged_dat %<>%
mutate(
d_cm = ifelse(hh_mortality_u18_5y == 1,0,
ifelse(hh_mortality_u18_5y == 0,
1, hh_mortality_u18_5y)),
d_nutr = ifelse(hh_nutrition_uw_st == 1,0,
ifelse(hh_nutrition_uw_st == 0,
1, hh_nutrition_uw_st)),
d_satt = ifelse(hh_child_atten == 1,0,
ifelse(hh_child_atten == 0,
1, hh_child_atten)),
d_educ = ifelse(hh_years_edu6 == 1,0,
ifelse(hh_years_edu6 == 0,
1, hh_years_edu6)),
d_elct = ifelse(electricity == 1,0,
ifelse(electricity == 0,
1, electricity)),
d_wtr = ifelse(water_mdg == 1,0,
ifelse(water_mdg == 0,
1, water_mdg)),
d_sani = ifelse(toilet_mdg == 1,0,
ifelse(toilet_mdg == 0,
1, toilet_mdg)),
d_hsg = ifelse(housing_1 == 1,0,
ifelse(housing_1 == 0,
1, housing_1)),
d_ckfl = ifelse(cooking_mdg == 1,0,
ifelse(cooking_mdg == 0,
1, cooking_mdg)),
d_asst = ifelse(hh_assets2 == 1,0,
ifelse(hh_assets2 == 0,
1, hh_assets2))
)Writing the file in csv format.
4.2.12 Mean uncensored headcout check
# mean(merged_dat$d_asst) #1/18
# mean(merged_dat$d_ckfl) #1/18
# mean(merged_dat$d_hsg) #1/18
# mean(merged_dat$d_sani) #1/18
# mean(merged_dat$d_wtr) #1/18 ####### incorrect
# mean(merged_dat$d_elct) #1/18
# mean(merged_dat$d_educ) #1/6
# mean(merged_dat$d_satt) #1/6
# mean(merged_dat$d_cm) #1/3 ####### incorect - missing
merged_dat %>%
summarise(
mean_uncensored_H_d_Asst = mean(merged_dat$d_asst),
mean_uncensored_H_d_ckfl = mean(merged_dat$d_ckfl),
mean_uncensored_H_d_hsg = mean(merged_dat$d_hsg),
mean_uncensored_H_d_sani = mean(merged_dat$d_sani),
mean_uncensored_H_d_wtr = mean(merged_dat$d_wtr),
mean_uncensored_H_d_elct = mean(merged_dat$d_elct),
mean_uncensored_H_d_educ = mean(merged_dat$d_educ),
mean_uncensored_H_d_satt = mean(merged_dat$d_satt),
mean_uncensored_H_d_cm = mean(merged_dat$d_cm),
) %>%
transpose() %>%
unlist() %>%
as.data.frame() %>%
kableExtra::kable()| . | |
|---|---|
| mean_uncensored_H_d_Asst | 0.0233875 |
| mean_uncensored_H_d_ckfl | 0.4157186 |
| mean_uncensored_H_d_hsg | 0.1217115 |
| mean_uncensored_H_d_sani | 0.2205942 |
| mean_uncensored_H_d_wtr | 0.0852209 |
| mean_uncensored_H_d_elct | 0.0096242 |
| mean_uncensored_H_d_educ | 0.0873286 |
| mean_uncensored_H_d_satt | 0.0301168 |
| mean_uncensored_H_d_cm | 0.0011173 |